The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
In this work, we propose a self-supervised multi-agent system, termed a memory-like adaptive modeling multi-agent learning system (MAMMALS), that realizes online learning towards behavioral pattern clustering tasks for time series. Encoding the visual behaviors as discrete time series(DTS), and training and modeling them in the multi-agent system with a bio-memory-like form. We finally implemented a fully decentralized multi-agent system design framework and completed its feasibility verification in a surveillance video application scenario on vehicle path clustering. In multi-agent learning, using learning methods designed for individual agents will typically perform poorly globally because of the behavior of ignoring the synergy between agents.
translated by 谷歌翻译
Recently, webly supervised learning (WSL) has been studied to leverage numerous and accessible data from the Internet. Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain. However, only by tackling the performance gap above can we fully exploit the practical value of web datasets. To this end, we propose a Few-shot guided Prototypical (FoPro) representation learning method, which only needs a few labeled examples from reality and can significantly improve the performance in the real-world domain. Specifically, we initialize each class center with few-shot real-world data as the ``realistic" prototype. Then, the intra-class distance between web instances and ``realistic" prototypes is narrowed by contrastive learning. Finally, we measure image-prototype distance with a learnable metric. Prototypes are polished by adjacent high-quality web images and involved in removing distant out-of-distribution samples. In experiments, FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets. Compared with existing WSL methods under the same few-shot settings, FoPro still excels in real-world generalization. Code is available at https://github.com/yuleiqin/fopro.
translated by 谷歌翻译
Self-supervised monocular depth estimation (MDE) models universally suffer from the notorious edge-fattening issue. Triplet loss, as a widespread metric learning strategy, has largely succeeded in many computer vision applications. In this paper, we redesign the patch-based triplet loss in MDE to alleviate the ubiquitous edge-fattening issue. We show two drawbacks of the raw triplet loss in MDE and demonstrate our problem-driven redesigns. First, we present a min. operator based strategy applied to all negative samples, to prevent well-performing negatives sheltering the error of edge-fattening negatives. Second, we split the anchor-positive distance and anchor-negative distance from within the original triplet, which directly optimizes the positives without any mutual effect with the negatives. Extensive experiments show the combination of these two small redesigns can achieve unprecedented results: Our powerful and versatile triplet loss not only makes our model outperform all previous SoTA by a large margin, but also provides substantial performance boosts to a large number of existing models, while introducing no extra inference computation at all.
translated by 谷歌翻译
在统一功能对应模型中建模稀疏和致密的图像匹配最近引起了研究的兴趣。但是,现有的努力主要集中于提高匹配的准确性,同时忽略其效率,这对于现实世界的应用至关重要。在本文中,我们提出了一种有效的结构,该结构以粗到精细的方式找到对应关系,从而显着提高了功能对应模型的效率。为了实现这一目标,多个变压器块是阶段范围连接的,以逐步完善共享的多尺度特征提取网络上的预测坐标。给定一对图像和任意查询坐标,所有对应关系均在单个进纸传球内预测。我们进一步提出了一种自适应查询聚类策略和基于不确定性的离群检测模块,以与提出的框架合作,以进行更快,更好的预测。对各种稀疏和密集的匹配任务进行的实验证明了我们方法在效率和有效性上对现有的最新作品的优势。
translated by 谷歌翻译
通过使用智能电表,零售商可以收集有关消费者行为的大量数据。从收集的数据中,零售商可以获取家庭概况信息并实施需求响应。尽管零售商更喜欢在不同客户中获取尽可能准确的模型,但有两个主要挑战。首先,零售市场中的不同零售商不会共享消费者的电力消耗数据,因为这些数据被视为其资产,这导致了数据岛的问题。其次,由于不同的零售商可以为各种消费者服务,因此电力负载数据是高度异质的。为此,提出了基于共识算法和长期记忆(LSTM)的完全分布的短期负载预测框架,这可能保护客户的隐私并满足准确的负载预测要求。具体而言,利用完全分布式的学习框架进行分布式培训,并采用共识技术来符合机密隐私。案例研究表明,所提出的方法具有相当的性能,而对准确性的集中方法具有相当的性能,但是所提出的方法显示了训练速度和数据隐私的优势。
translated by 谷歌翻译
开放世界对象检测(OWOD)是一个具有挑战性的计算机视觉问题,需要检测未知对象并逐渐学习已确定的未知类别。但是,它不能将未知实例区分为多个未知类。在这项工作中,我们提出了一个新颖的OWOD问题,称为未知分类的开放世界对象检测(UC-OWOD)。 UC-OWOD旨在检测未知实例并将其分类为不同的未知类别。此外,我们制定问题并设计一个两阶段的对象检测器来解决UC-OWOD。首先,使用未知的标签意见建议和未知歧视性分类头用于检测已知和未知对象。然后,构建基于相似性的未知分类和未知聚类改进模块,以区分多个未知类别。此外,设计了两个新颖的评估方案,以评估未知类别的检测。丰富的实验和可视化证明了该方法的有效性。代码可在https://github.com/johnwuzh/uc-owod上找到。
translated by 谷歌翻译
姿势估计对于机器人感知,路径计划等很重要。机器人姿势可以在基质谎言组上建模,并且通常通过基于滤波器的方法进行估算。在本文中,我们在存在随机噪声的情况下建立了不变扩展Kalman滤波器(IEKF)的误差公式,并将其应用于视觉辅助惯性导航。我们通过OpenVINS平台上的数值模拟和实验评估我们的算法。在Euroc公共MAV数据集上执行的仿真和实验都表明,我们的算法优于某些基于最先进的滤波器方法,例如基于Quaternion的EKF,首先估计Jacobian EKF等。
translated by 谷歌翻译
神经量渲染能够在自由观看中的人类表演者的照片真实效果图,这是沉浸式VR/AR应用中的关键任务。但是,这种做法受到渲染过程中高计算成本的严重限制。为了解决这个问题,我们提出了紫外线量,这是一种新方法,可以实时呈现人类表演者的可编辑免费视频视频。它将高频(即非平滑)的外观与3D体积分开,并将其编码为2D神经纹理堆栈(NTS)。光滑的紫外线量允许更小且较浅的神经网络获得3D的密度和纹理坐标,同时在2D NT中捕获详细的外观。为了编辑性,参数化的人类模型与平滑纹理坐标之间的映射使我们可以更好地对新型姿势和形状进行更好的概括。此外,NTS的使用启用了有趣的应用程序,例如重新启动。关于CMU Panoptic,ZJU MOCAP和H36M数据集的广泛实验表明,我们的模型平均可以在30fps中呈现960 * 540张图像,并具有可比的照片现实主义与先进方法。该项目和补充材料可从https://github.com/fanegg/uv-volumes获得。
translated by 谷歌翻译
在这项工作中,我们提出了一个单视手网格重建框架,可以同时实现高重建精度,快速推断速度和时间相干性。具体而言,对于2D编码,我们提出了轻量级但有效的堆叠结构。关于3D解码,我们提供有效的图形操作员,即深度可分离的螺旋卷积。此外,我们提出了一种用于桥接2D和3D表示之间的间隙的新颖特征提升模块。该模块以基于地图的位置回归(MapReg)块开头,以集成HeatMap编码和位置回归范例的优点,以提高2D精度和时间相干性。此外,MapReg之后是姿势池和姿势到顶点提升方法,它将2D姿势编码转换为3D顶点的语义特征。总体而言,我们的手部重建框架称为MobRecon,包括经济实惠的计算成本和微型模型大小,在Apple A14 CPU上达到83FP的高推理速度。广泛的对流行数据集如弗里安,RHD和HO3DV2的实验表明,我们的Mobrecon在重建准确性和时间一致性方面取得了卓越的性能。我们的代码在https://github.com/seanchenxy/handmesh公开提供。
translated by 谷歌翻译